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We show that the protein-protein interaction networks can be surprisingly well described by a very simple 
evolution model of duplication and divergence. The model exhibits a remarkably rich behavior depending on 
a single parameter, the probability to retain a duplicated link during divergence. When this parameter is large, 
the network growth is not self-averaging and an average vertex degree increases algebraically. The lack of self- 
averaging results in a great diversity of networks grown out of the same initial condition. For small values of 
the link retention probability, the growth is self-averaging, the average degree increases very slowly or tends to 
a constant, and a degree distribution has a power-law tail. 
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I. INTRODUCTION 

A single- and multi- gene duplication plays crucial role in 
evolution 1 1, 2]. On the proteinomic level, the gene duplica- 
tion leads to a creation of new proteins that are initially iden- 
tical to the original ones. In a course of subsequent evolution, 
the majority of these new proteins are lost as redundant, while 
some of them survive by diverging, i.e. quickly loosing old 
and possibly slowly acquiring new functions. 

The protein-protein interaction network is commonly de- 
fined as an evolving graph with nodes and links corresponding 
to proteins and their interactions. Thus a successful single- 
gene duplication event results in a creation of a new node 
which is initially linked to all the neighbors of the origi- 
nal node. Later, some links between each of the duplicates 
and their neighbors disappear. Fig. Q. Such network evolu- 




FIG. 1 : A sketch of duplication and divergence event. Links between 
the duplicated vertex and vertices 3 and 4 disappeared as a result of 
divergence. 
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tionprocess is commonly called a duplication and divergence 
Q, m . Although duplication and divergence is usually con- 
sidered as the growth mechanism only for protein-protein net- 
works, it also may play a role in a creation of certain new 
nodes and links in the world wide web, growth of various net- 
works of human contacts by introduction of close acquain- 
tances of existing members, and evolution of many other non- 
biological networks. 

Does the evolution dominated by duplication and diver- 
gence define the structure and other properties of a network? 
So far, most of the attention has been attracted to the study of 
a degree distribution nt, which is a probability for a vertex to 
have k links. Wagner yl has provided a numerical evidence 
that duplication-divergence evolution does not noticeably al- 
ter the initial power-law degree distribution, provided that the 
evolution is initiated with a fairly large network. A somewhat 
idealized case of the completely asymmetric divergence 0101 
when links are removed only from one of the duplicates (as in 
Fig.Q was investigated in Refs. |5l||6|]. It was found that the 
emerging degree distribution has a power-law tail: Uk ^ fc^^ 
for A; ^ 1. Yet apart from the shape of the degree distribution, 
a number of other perhaps even more fundamental properties 
of duplication-divergence networks remain unclear: 

1 . How well does the model describe its natural prototype, 
the protein-protein networks ? 

2. Is the total number of links a self-averaging quantity ? 

3. How does the average total number of Unks depend on 
the network size N ? 

4. Does the degree distribution scale linearly with N ? 

A non-trivial answer to any of these questions would be much 
more important than details of the tail of the degree distribu- 
tion; the reason why only these details are usually studied is 
that the more fundamental questions are assumed to have triv- 
ial answers. 

Here we shall attempt to answer above questions and 
we shall also look again at the degree distribution of the 
duplication-divergence networks. As in |5], we consider a 
simple scenario of totally asymmetric divergence, where evo- 
lution is characterized by a single parameter, link retention 
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probability a. It turns out that even such ideahzed model de- 
scribes the degree distribution found in the biological protein- 
protein networks very well. We find that, depending on cr, the 
behavior of the system is extremely diverse: When more than 
a half of links are (on average) preserved, the network growth 
is non-self-averaging, the average degree diverges with the 
network size, and while a degree distribution has a scaling 
form, it does not resemble any power law. In a complimentary 
case of small a the growth is self-averaging, the average de- 
gree tends to a constant, and a degree distribution approaches 
a scaling power-law form. 

In the next section we formally define the model and com- 
pare the simulated degree distribution to the observed ones. 
The properties of the model are first analyzed in the tractable 
cr = 1 and a +0 limits (Sec. Ill) and then in the general 
case < cr < 1 (Sec. IV). Section V gives conclusions. 

II. DUPLICATION AND DIVERGENCE 

To keep the matter as simple as possible, we focus on the 
completely asymmetric version of the model of duplication 
and divergence network growth. The model is defined as fol- 
lows (Fig[l]l: 

1 . Duplication. A randomly chosen target node is dupli- 
cated, that is its replica is introduced and connected to 
each neighbor of the target node. 

2. Divergence. Each link emanating from the replica is 
activated with probability a (this mimics link disappear- 
ance during divergence). If at least one link is estab- 
lished, the replica is preserved; otherwise the attempt is 
considered as a failure and the network does not change. 
(The probability of the failure is (1 — cr)*^ if the degree 
of the target node is equal tok.) 

In contrast to duplication-mutation models (see e.g. 
0,n^), no new links are introduced. Initial conditions appar- 
ently do not affect the structure of the network when it be- 
comes sufficiently large; in the following, we always assume 
that the initial network consists of two connected nodes. As 
in the observed protein-protein interaction networks, in this 
model each node has at least one link and the network re- 
mains connected throughout the evolution. These features is 
the main distinction between our model and earlier models 
(see e.g. |5]) which allowed an addition of nodes with no 
links and generated disconnected networks with questionable 
biological relevance. 

The above simple rules generate networks which are strik- 
ingly similar to the naturally occurring ones. This is evident 
from Figs. |2j0 which compare the degree distribution of the 
simulated networks and protein-protein binding networks of 
baker yeast, fruit fly, and human. The protein interaction data 
for all three species were obtained from the Biological Asso- 
ciation Network databases available from Ariadne Genomics 
fllll . The data for human {H. sapiens) protein network was 
derived from the Ariadne Genomics ResNet database con- 
structed from the various literature sources using Medscan 
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FIG. 2: Degree distribution of protein-protein binding network of 
yeast with A'p=4873 proteins and average degree (d) ~ 6.6. The 
link retainment probability of fitted simulated network a ~ 0.413. 
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FIG. 3: Degree distribution of protein-protein binding network of 
fly with A'p=6954 proteins and average degree (d) ~ 5.9. The link 
retainment probability of fitted simulated network a ~ 0.380. 

fT^ . The data for baker yeast {S. cerevisiae) and fruit fly {D. 
melanogaster) networks were constructed by combining the 
data from published high-throughput experiments with the lit- 
erature data obtained using Medscan as well ^13ll . 

Each simulated degree distribution was obtained by aver- 
aging over 500 realizations. The values of the link retention 
probability a of simulated networks were selected to make 
the mean degree (d) of the simulated and observed networks 
equal. The number of nodes and the number of links in the 
corresponding grown and observed networks were therefore 
equal as well. 

Figures |2j0 demonstrate that even the most primitive form 
of the duplication and divergence model (which does not ac- 
count for disappearance of links from the original node, in- 
troduction of new links, removal of nodes, and many other 
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FIG. 4: Degree distribution of protein-protein binding network of 
human with Np=5275 proteins and average degree (d) ~ 5.7. The 
hnk retainment probabihty of fitted simulated network a ~ 0.375. 
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FIG. 5: Degree distributions of grown networks with (bottom to top) 
10*, 10^, and 10® vertices. The link retention probability a = 0.45, 
all data was averaged over 100 realizations. 



biologically relevant processes) reproduces the observed de- 
gree distributions rather well. These figures also show that the 
degree distributions of both simulated and naturally occurring 
networks are not exactly resembling power-laws that they are 
commonly fitted to (see, for example, |3]). A possible expla- 
nation is that the protein-protein networks (naturally limited 
to few tens thousand of nodes) are not large enough for a de- 
gree distribution to converge to its power-law asymptotics. To 
probe the validity of this argument we present (Fig.|5} the de- 
gree distributions for networks of up to 10^ vertices with link 
retention probability similar to the fitted to the observed net- 
works, a — 0.45. It follows that a degree distribution does not 
attain a power-law form even for very large networks, at least 
for naturally occurring a < 1/2. 



III. SOLVABLE LIMITS 

Here we analyze duplication-divergence networks in the 
limits (T — 1 and cr ^ when the model is solvable and 
(almost) everything can be computed analytically. 

A. No divergence (a — 1) 

This case has already been investigated in Refs. f5','^,'l4|. 
Here we outline its properties as it will help us to pose relevant 
questions in the general case when divergence is present. 

When (7 = 1, each duplication attempt is successful and 
the network remains a complete bipartite graph throughout 
the evolution: Initially it is Kiy, at the next stage the net- 
work turns into i^2,i or K12, equiprobably; and generally 
when the number of nodes reaches N, the network is a com- 
plete graph Kj^M-j with every value j = 1, . . . , iV — 1 occur- 
ring equiprobably. In the complete bipartite graph Kj,N^j 
the degree of a node has one of the two possible values: 
j and N ~ j. Hence in any realization of a cr = 1 net- 
work, the degree distribution is the sum of two delta func- 
tions: Nk{j) = jSkM-j + {N - j)Sk,j. Averaging over all 
realizations we obtain 
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The total number of links L in the complete graph Kj j^^j is 
L = j{N — j). Averaging over all j we can compute any 
moment {L^}; for instance, the mean is equal to 



N{N + 1) 



6 



and the mean square is given by 

N{N + 1){N^ + 1) 



30 



(2) 



(3) 



In the thermodynamic limit N 00, L 00, the link 
distribution Pn{L) becomes a function of the single scaling 
variable £ = L/N"^, namely: 



N-l 



Pn{L) 



N 



L.j{N~,) ^ N-^r{i) (4) 



with Vil) = 2/Vl - 4^. The key feature of the networks 
generated without divergence (cr = 1) is the lack of self- 
averaging. In other words, fluctuations do not vanish in the 
thermodynamic limit. This is evident from Eqs. (|2ji-@: In 
the self-averaging case we would have had (L^)/(L)^ = 1 
(instead of the actual value (i^) / (i)^ = 6/5) and the scaling 
function V{€} would be the delta function. The lack of self- 
averaging implies that the future is uncertain — a few first 
steps of the evolution drastically affect the outcome. 

Finally we mention that the a = 1 limit of our model is 
equivalent to the classical Polya's urn model |15]. The urn 
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models have been studied in the probabiHty theory fT^, have 
applications ranging from biology LI 7.1 to computer science 
1113. Il9l] . and remain in the focus of the current research (see 
e.g. I2(]tl2lll and references therein). 



B. Maximal divergence (a = +0) 

Let (7^1. Then in a successful duplication attempt, the 
probability of retaining more than one link is very small (of 
the order of a). Ignoring it, we conclude that in each success- 
ful duplication event, one node and only one link are added, 
so when ct ^ 1 the emerging networks are trees. 

If the degree of the target node is k, the probability of the 
successful duplication is 1 — (1 — cr)'^ which approaches ak 
when (7^1. Hence any of the k neighbors of the target node 
will be linked to the potentially duplicated node with the same 
probability cr. 

A given node n links to the new, duplicated, node in a pro- 
cess which starts with choosing a neighbor of n as the target 
node. The probability of that is proportional to the degree dn 
of the node n. Then the probability of linking to the node 
n is (T (as we already established) so the probabiUty that the 
new node links to n is proportional to its degree d„. Thus 
we recover the standard preferential attachment model 
This model exhibits the well-known behavior: The total num- 
ber of links is i = — 1, and the degree distribution is a 
self-averaging quantity peaked around the average. 



Nk = 



k{k + l){k + 2)' 



(5) 



IV. GENERAL CASE (0 < a < 1) 

We now move on to the discussion of the general case 
which is only partially understood. 



A. Self-averaging 

Self-averaging of any quantity can be probed by analyz- 
ing a relative magnitude of fluctuations of that quantity. As 
a quantitative measure we shall use the ratio of the standard 
deviation to the average. For the total number of links. 
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should vanish in the thermodynamic limit if the total num- 
ber of links is the self-averaging quantity. A lack of self- 
averaging would be extremely important — it would imply 
that a slight deviation in the earlier development could lead to 
a very different outcome. Even if x vanishes in the thermo- 
dynamic limit, fluctuations may still play noticeable role if x 
approaches zero too slowly. 

Simulations (Fig.|6} show that the system is apparently self- 
averaging when a < 1/2. It is somewhat difficult to establish 
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FIG. 6: X vs. N for (top to bottom) cr = 3/4, 1/2, 1/4. The total 
number of nodes is obviously a self-averaging quantity for cr — 1/4, 
apparently also self-averaging for a = 1 /2, and evidently non self- 
averaging for CT = 3/4. 



what is happening in the borderline case a = 1/2, though 
we are inclined to believe that self-averaging still holds. The 
self- averaging is evidently lost at cr — 3/4, and the system 
is certainly non-self-averaging for cr = 1 (in this situation 
X = I/a/S, see Eqs. (|2j-(|3|l). These findings suggest that in 
the range 1/2 < cr < 1 the total number of links is not a 
self-averaging quantity. 



B. Total number of links 

According to the definition of the model, a target node is 
chosen randomly. Therefore, the probability that a duplication 
event is successful, or equivalently, the average increment of 
the number of nodes per attempt is 



(7) 
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where Uk — N^/N is a probability for a node to have a degree 
k. Similarly the increment of the number of links per step is 



AL = Uk ka 



k>l 



and therefore 
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dN Efc>i nk [1 - (1 - ^)'] ' 



(8) 



The inequality fccr > 1 — (1 — cr)*^ is valid for all fc > 1 and 
therefore dL / dN > 1 implying 



L> N -1. 



(9) 



This is obvious geometrically as (|9j should hold for any con- 
nected network. 
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Using Eq. (|8} we can verify the self-consistency of our con- 
clusion (|5} derived in the case of a = +0. Substituting in 
(|8} we obtain 



dL 
dN 



= 1 + cr(- In cr - 1) +e'[(crlncr) 



(10) 



It confirms our assumption that for vanishing a, each success- 
ful duplication event increments the number of links by one. 

To analyze the growth of L versus N, we use the definition 
Q of ly, an identity 2L — '^ kN^, and re-write (|8} as 



dL _2a L 
dN ^ ~N' 



(11) 



which leads to an algebraic growth L ^ jV^*^/"^. Noting 
that ly cannot exceed one (this follows from and the sum 
rule J2^k — 1) we conclude that growth is certainly super- 
linear when a > 1/2. Hence the average degree (d) = 
J2k>i = 2L/N diverges with system size algebraically, 
(d) ~ iV" with a = 2(j/v — 1>Q. Since the average degree 
grows indefinitely, the probability of the failure to inherit at 
least one link approaches zero, that is 1 as ^ cx3. 

Therefore we anticipate that asymptotically L ^ N'^"' and 
(d) ^ N" with a — 2a ~ 1 > 0. These expectations 
agree with simulations fairly well (Fig.0. For instance when 
cr = 3/4, the predicted exponent a = 1/2 is close to the fit- 
ted one, a = 0.51 (Fig. 0. The agreement is worse when 
a approaches a ~ 1/2; the predicted exponent for ct = 5/8 
a = 1/4 is notably smaller than anumcr ~ 0.3. 
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FIG. 7: The average node degree (d) vs A*' for (bottom to top, dashed 
lines ) a — 1/2, 5/8, 3/4. Solid lines are corresponding power-law 
(d) ~ best fits for the lai'ge iV parts of the plots: a{a ^ 1/2) ^ 
0.16, a{a = 5/8) « 0.30, a{a = 3/4) fa 0.51. The results ai'e 
averaged over 100 network realizations. 

In the range a < 1/2, we cannot establish on the basis 
of Eq. alone whether the growth is super-linear or lin- 
ear (the growth is at least linear as it follows from the lower 
bound (|9}). The average node degree (d) grows with N but 
apparently saturates when cr is close to zero (see Fig.|8}. For 



a « 0.3 — 0.4 the average degree seems to grow logarithmi- 
cally, that is L{N) ~ A^lniV. For cr = 1/2 the growth of 
(d) is super-logarithmical (see Fig. |8} and can be fitted both 
by (d) ^ (IniV)^ with /3 « 2, or by a power-law (d) ^ N" 
with a fairly small exponent a(l/2) « 0.16. 
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FIG. 8: The average node degree (d) vs TV in the self-averaging 
regime, a = 1/16, 1/8, 1/4, 3/8, 0.45, 1/2 (bottom to top). The 
results are averaged over 100 network realizations. 

Hence, taking into account the simulation results and limit- 
ing cases considered earlier, the behavior of L can be summa- 
rized as follows; 



N^'' for l/2<cr<l; 
N\nN for cr* < cr < 1/2; 
N for < cr < cr*; 



(12) 



Numerically it appears that cr* w 0.3 — 0.4. In the next sub- 
section we will demonstrate that cr* = e^^ = 0.367879 . . .. 



C. Degree distribution 

A rate equation for the degree distribution is derived in the 
same manner as Eq. (jS): 



i"-^ =cr[(fc-l)nfe„l 



kuk 



TOfe 



Here we have used the shorthand notation 



ruk 



E 

s>k 



(13) 



(14) 



for the probability that the new node acquires a degree k. The 
general term in the sum on the right-hand side of Eq. (I14t 
describes duplication event in which k links remains and s~k 
links are lost due to divergence. 
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Summing both sides of il3\ over all fc > 1 we obtain v 
on the left-hand side. On the right-hand side, only the second 
term contributes to the sum and also gives the same ly: 

fe>l S>1 fc = l ^ ^ 

- 5]n,[l-(l-c7)^]=:., 

S>1 

where the second line was derived using the binomial identity. 
Similarly, multiplying il3\ by k and summing over all fc > 1 
we recover Jl 1> . These two checks show consistency of (I13> 
with the growth equations, introduced earlier 
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FIG. 9: The degree distribution Wft vs. k for (bottom to to top) a — 
1/4, a ^ 1/2, and a = 3/4. The size of the network is iV = lO'' 
for CT = 1/4, AT = 5 X lO"* for a = 1/2, and iV = 10* for cr = 3/4. 
The results are averaged over 100 realizations. 

Since ly depends on all n^, see Q, Eqs. il3\ are non-linear 
However, the observations made in the previous subsection 
allow us to approximate, for any given a, v as parameter, 
thus ignoring its possible very slow dependence on N . Re- 
sulting linear Eqs. jl3> are still very complicated: If we as- 
sume that fc ^ 1 and employ the continuous approach, we 
still are left with a system of partial differential equations with 
a non-local "source" term m^. Fortunately, the summand in 
mfc, that is g(s,fc) = (^)ct''(1 — aY^^ , is sharply peaked 
around s k, kja |5|. Hence we can replace X]s>fe "s.9(*' ^) 
by n'k/aJ2s>k9{s,k) = a^'^Uk/a 0, and Eqs. O be- 
come 

"^^^'^"^k''^'^''''^'/" ^^^^ 

Still, the analysis of ( I15> is hardly possible without know- 
ing the correct scaling. Figure|9]indicates that the form of the 
degree distribution varies with cr significantly. We will pro- 
ceed (separately for < a < 1/2 and 1/2 < a < 1) by 
guessing the scaling and trying to justify the consistency of 
the guess. 



1. 0<o-<l/2 

Assuming the simplest linear scaling Nk ^ N we reduce 
Eq. Ol to 

d 2 
2nk + — knk = (J rifc/o-- (16) 

We also used v — 2a, which is required to assure that L ^ N 
ll24iil is consistent with (till . Plugging ~ k"'^ into ( fT6l we 
obtain 

7 = 3-cr'^-^ (17) 
This equation has two solutions: 7 = 2 and a non-trivial so- 
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FIG. 10: The degree distribution exponent 7(0-) from Eq. <17> . 

lution 7(it) which depends on a. The second solution 7((t) 
decreases from 7(0) = 3 to 7(1/2) = 1. The two solutions 
coincide at a* = e^^ = 0.367879. The sum '^kuk con- 
verges when 7 > 2, and the total number of links grows lin- 
early, L ^ N . Apparently the appropriate solution is the one 
which is larger: For a < e^^ the exponent is 7(ct), while for 
a > the exponent is 7 = 2, Fig.^| In the latter case, 

kuk ^ E ~ Infcmax ~ In 

and therefore the total number of links grows as N In N. 

Simulations show that for small cr the degree distribution 
Uk has indeed a fat tail (see Fig. II It . The agreement with the 
theoretical prediction of the algebraic tail is very good when 
cr = 1/8 (Eq. (O gives 7 2.817187 while numerically 
7numcr ~ 2.82), not SO good when a = 1/4 (7 — 5/2 vs. 
7numcr ~ 2.7), and fair at best for cr = 3/8. 

Thus we explained the growth law il2\ . We also arrived at 
the theoretical prediction of cr* which reasonably well agree 
with simulation results. Due to the presence of logarithms, 
the convergence is extremely slow and better agreement will 
be probably very hard to achieve. Finally we note that the 
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FIG. 11: rifc vs fc for the network of size N = 10^5 in the self- 
averaging regime, a — +0, 1/8, 1/4, 3/8, 0.45 (bottom to top). The 
result for a = +0 is the exact solution Js}, simulation data is aver- 
aged over 100 realizations. The corresponding analytical predictions 
for the exponent are 7(ct = 1/8) = 2.817187, 7(0- = 1/4) = 5/2, 
and j{a = 3/8) = -f{a = 0.45) = 2. 

behaviors L ^ N\nN and Uk ~ arise in a surprisingly 
large number of technological and social networks (see 1 2^1 
and references therein). 

2. l/2<a<l 




FIG. 12: Scaling of the degree distribution in the networks of A'^ = 
100, N = 1000, and N = 10000 nodes with a = 3/4. 

The growth law ( I12t suggests an introduction of a scaling 
form Nk = N^-^''F{x) with x = k/N"^"-^. Then the sum 
rules ^ Nk — N and ^ fciV^ ^ A^^'^ are manifestly satisfied 
(provided that the scaling function F{x) falls off reasonably 
fast for X 00). Simulation results (see Fig. I12> are in a 
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FIG. 13: Using a multitude of direct and indirect methods, von 
Mering et al f2d\ predicted 78928 links between 5397 yeast proteins 
which produces a network with the average degree (d) ~ 29.2. A 
power-law fit to this degree distribution has the exponent 7 ~ 1.1. 



good agreement with above scaling form. 



V. CONCLUSIONS 

We have shown that a simple one-parameter duplication- 
divergence network growth model well approximates realistic 
protein-protein networks. Table U summarizes how the major 
network features (self-averaging, evolution of the number of 
links L{N), the degree distribution Uk) change when the link 
retention probability a varies. 

Two most striking features of duplication-divergence net- 
works are the lack of self-averaging for cr > 1/2 and ex- 
tremely slow growth of the average degree for a < 1/2. These 
features have very important biological implications: The lack 
of self-averaging naturally leads to a diversity between the 
grown networks and the slow degree growth preserves the 
sparse structure of the network. Both of these effects occur 
in wide ranges of parameter a and therefore are robust — it is 
hard to expect that nature would have been able to fine-tune 
the value of cr if it were not so. 

Our findings indicate that in the observed protein-protein 
networks a w 0.4, so biologically-relevant networks seem to 
be in the self-averaging regime. One must, however, take the 
experimental protein-protein data with a great degree of cau- 
tion: It is generally acknowledged that our understanding of 
protein-protein networks is quite incomplete. Usually, as the 
new experimental data becomes available, the number of links 
and the average degree in these network increases. Hence the 
cuiTently observed degree distributions may reflect not any in- 
trinsic property of protein-protein networks, but a measure of 
an incompleteness of our knowledge about them. Therefore 
a possibility that the real protein-protein networks are not (or 
have not been at some stage of the evolution) self-averaging is 
not excluded. 
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a 


self-averaging 


L(7V) 




cr = 1 


No 


Af(iV + l)/6 


2(Ar-fc)/[7V(iV - 1)] 


1/2 < a < 1 


No 


Ar2(T — 1 


^ iV F{k/N ) 


< a < 1/2 


Yes 


~ N\uN 


probably ~ fc^^ 


< CT < e"^ 


Yes 


~ TV 




cr = +0 


Yes 


iV- 1 


4/[fc(fc + l)(fc + 2)] 



TABLE I: The behavior of the duplication-divergence network for different values of probability to inherit a link a. Here L{N) is the average 
number of links for given number of nodes A'^, Uk the average fraction of nodes of degree k, and the exponent 7(0-) > 2 is defined by equation 

7 = 3-a^-^ 



It has been suggested that randomly introduced links (mu- 
tations) must compliment the inherited ones to ensure the self- 
averaging and existence of smooth degree distribution |7]. 
While a lack of random linking does affect the fine structure 
of the resulting network, we have observed that the major fea- 
tures like self-averaging, growth law, and degree distribution 
are rather insensitive to whether random links are introduced 
or not, provided that the number of such links is significantly 
less than the number of inherited ones. We performed a num- 
ber of simulation runs where links between a target node and 
its image were added at each duplication step with a proba- 
bility Pd- Introduction of such links is the most direct way to 
prevent partitioning of the network into a bipartite graph (see 
pi). In other words, without such links the target and dupli- 
cated nodes are never directly connected to each other. We ob- 
served that for reasonable values of < 0.1 (in the observed 
yeast, fly, and human protein-protein networks Pd never ex- 
ceeds this value) the results remain unaffected. Apparently, 
without randomly introduced links, the network characteris- 
tics establish themselves independently in every subset of ver- 



tices duplicated from each originally existing node. We leave 
more systematic study of the effects of mutations as well as of 
the more symmetric divergence scenarios (when links may be 
lost both on the target and duplicated node) for the future. 

Many unanswered questions remain even in the realm of 
the present model. For instance, little is known about the be- 
havior of the system in the borderline cases of ct = 1/2 and 
a = e^^. One also wants to understand better the tail of the 
degree distribution in the region a > where L{N) fol- 
lows unusual scaling laws. It will be also interesting to study 
possible implications of these results for the probabilistic urn 
models HI]. 
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